npj Genomic Medicine — Latest Matching Preprints

1

Targeted BRCA1/BRCA2 Sequencing in a Bangladeshi Clinically Referred Cohort Identifies Candidate BRCA1 Loss-of-Function Variants and a Multi-Exon Deletion-Like CNV Signal

Al Sium, S. M.; Banu, T. A.; Goswami, B.; Naser, S. R.; Habib, M. A.; Akter, S.; Ara, M. H.; Al Din, S. M. S.; Nafisa, A.; Nayem, M. R.; Rabbi, M. F. A.; Sarkar, M. M. H.; Khan, M. S.

2026-05-20 oncology 10.64898/2026.05.11.26352643 medRxiv

Top 0.1%

17.3%

Show abstract

Background: Population-relevant BRCA1/BRCA2 data from Bangladesh are scarce, creating challenges for hereditary breast and ovarian cancer variant interpretation, counseling, and follow-up testing. We examined a clinically referred Bangladeshi cohort to characterize assay-derived BRCA1/BRCA2 short variants, sequencing-depth performance, and copy-number findings in a conservative pilot framework. Methods: Twenty-three de-identified blood-derived DNA samples were assessed using a targeted BRCA1/BRCA2 next-generation sequencing workflow. Downstream analysis used assay-generated short-variant, coverage, and CNV outputs, with coordinates reported on hg19/GRCh37. Short variants were evaluated from high-confidence PASS/VCC-H calls, and CNV review incorporated both target-region and amplicon-level copy-number patterns. Results: After removal of four low-VAF review observations, the primary germline-compatible dataset comprised 304 short-variant observations representing 34 unique variants. Both BRCA1 and BRCA2 contributed comparable variant burdens, while the overall profile was mainly composed of missense and synonymous changes. Six sample-specific heterozygous BRCA1 truncating candidates were observed, including five frameshift variants and one stop-gain variant. Protein-level mapping placed these events across the central-to-C-terminal portion of BRCA1. Sequencing depth was consistently high across the targeted regions, with all 4,255 amplicon-sample measurements exceeding 280x and 99.91% reaching at least 500x. Copy-number analysis highlighted one candidate BRCA1 multi-exon deletion-like event involving exons 15-20 in BCSIR-BRCA-21, with unresolved partial exon 14 involvement. Conclusions: This study provides an initial Bangladesh-focused targeted BRCA1/BRCA2 dataset and identifies candidate short-variant and CNV findings for validation. These findings should be interpreted as analytical candidates only and require confirmatory testing and expert clinical curation before any clinical application. The cohort is referral-enriched and should not be used to infer population prevalence.

2

The Genetic Landscape and Epidemiological Characteristics of Inherited Retinal Diseases in the Chinese Population

Zeng, B.; Cui, Z.; Zhou, S.; Dai, W.

2026-05-29 ophthalmology 10.64898/2026.05.27.26354224 medRxiv

Top 0.1%

17.2%

Show abstract

Background: Inherited Retinal Diseases (IRDs) are a group of genetically heterogeneous blinding conditions. Major global genomic reference databases are disproportionately enriched for individuals of European ancestry. This underrepresentation creates a significant bias that impedes the accuracy of genetic diagnosis in the Chinese population. This study aims to address this limitation by constructing a comprehensive genetic landscape of IRDs using large-scale deep-sequencing data from a large Chinese cohort. Methods: The study leveraged variant data primarily from 10,588 individuals in the China Metabolic Analytics Project (ChinaMAP) and cross-referenced findings against multiple national and international databases. We systematically curated variants within a targeted panel of 291 IRD-associated genes. Variant pathogenicity was assessed using a comprehensive pipeline integrating InterVar-automated classification based on 2015 American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) guidelines, ClinVar evidence (review status [≥] 1 star), and manual literature curation. We delineated the mutational spectrum, identified population-enriched pathogenic/likely pathogenic (P/LP) variants, and analyzed the distribution characteristics of IRD-associated highly-mutated genes. Furthermore, we calculated the carrier frequencies (CF) and genetic prevalence (GP) of autosomal recessive(AR)-IRD genes in the Chinese population. Results: The study revealed a highly concentrated genetic landscape for AR-IRDs in the Chinese population, with ABCA4 and USH2A emerging as the primary drivers of the genetic burden. This finding aligns with previous Chinese cohorts but contrasts with global databases, where genes such as the X-linked RPGR are more prevalent. In contrast, autosomal dominant (AD)-IRDs exhibited high locus heterogeneity, with pathogenic variants dispersed across numerous genes (e.g., COL2A1 and MFN2). We identified a series of P/LP variants that were either high-frequency or significantly enriched in the Chinese population, such as CNGB1 (p.P530R) and specific recurrent alleles in ABCA4 and CYP4V2. The estimated cumulative CF for AR-IRDs was 1 in 5.60, and the theoretical total GP was 1 in 2,624.67, based on the ChinaMAP data. Conclusion: By integrating the ChinaMAP dataset with diverse genomic resources, this study provides a genetic landscape of IRDs in the Chinese population. Our analysis shows a concentrated mutational spectrum in AR-IRDs, contrasting with the pronounced heterogeneity in AD-IRDs. These findings, including population-specific pathogenic variants and refined prevalence estimates, provide a resource for precision diagnostics, genetic counseling, expanded carrier screening (ECS), and public health policy development in China.

3

In vitro splice-switching oligonucleotide rescues aberrant GFM2 pseudoexon inclusion and restores mitochondrial activity

Gross, S.; Birnbaum, R.; Shaul Lotan, N.; Mor-Shaked, H.; Manor, J.; Shaag, A.; Rosenbluh, C.; Levy-Memo, A.; Yanovsky-Dagan, S.; Saada, A.; Harel, T.

2026-06-01 genetic and genomic medicine 10.64898/2026.05.28.26354078 medRxiv

Top 0.1%

10.0%

Show abstract

Background: Biallelic variants in GFM2, encoding mitochondrial elongation factor G2 (mtEFG2), a GTPase involved in the termination stage of mitochondrial translation, cause autosomal recessive combined oxidative phosphorylation deficiency. Noncoding structural variants may be missed by exome sequencing but can disrupt splicing and provide opportunities for variant-specific therapeutic rescue. We investigated the molecular mechanism underlying suspected Leigh syndrome in an infant with mitochondrial disease and evaluated whether splice-switching oligonucleotide (SSO) treatment could correct the pathogenic splicing defect. Methods: The proband underwent exome sequencing followed by short-read and long-read whole genome sequencing. RNA sequencing, reverse-transcription PCR, quantitative PCR, and cycloheximide treatment were used to characterize the effect of the identified intronic duplication on GFM2 splicing and transcript stability. Patient-derived fibroblasts were treated with SSOs targeting the aberrant splice junction. Rescue was assessed by RNA studies, western blotting, and spectrophotometric measurement of cytochrome c oxidase (COX). Results: Whole genome sequencing identified a paternally-inherited GFM2 missense variant, NM_032380.5:c.2195C>T p.(Pro732Leu), in trans to a maternally-inherited 221-nucleotide intronic duplication, NM_032380.5:c.2029-741_2029-521dup. RNA studies revealed a 87-nucleotide pseudoexon, generated by activation of a cryptic acceptor splice site within the duplicated sequence. The resulting transcript harbored a premature termination codon (PTC) and underwent nonsense-mediated decay, as confirmed by cycloheximide rescue. Together with reduced mtEFG2 protein levels on western blot, the findings supported a loss-of-function mechanism. Enzymatic analysis of affected fibroblasts showed reduced activity of the mtDNA-dependent complex IV subunit COX, with preservation of the nuclear-encoded complex II enzyme succinate dehydrogenase and the control enzyme citrate synthase, consistent with impaired mitochondrial translation. A SSO targeting the aberrant intron-pseudoexon junction nearly abolished pseudoexon inclusion, restored correctly spliced GFM2 transcript from the duplication-containing allele, increased mtEFG2 protein levels, and significantly improved COX activity. Conclusions: This study identifies a pathogenic intronic GFM2 duplication that causes mitochondrial disease through pseudoexon activation and nonsense-mediated decay. The findings demonstrate the value of integrated genome and transcriptome analysis for exome-negative mitochondrial disease and provide in-vitro proof of concept that SSOs can restore transcript processing, protein expression, and mitochondrial respiratory-chain function in patient-derived cells.

4

Stratified evaluation of blood RNA sequencing in a rare disease cohort

Duzenli, T.; Durmus, S.; Kaya, H. E.; Sevilgen, F. E.; Kayhan, G.; Cakir, T.; Ergun, M. A.

2026-05-28 genetic and genomic medicine 10.64898/2026.05.27.26353804 medRxiv

Top 0.1%

9.8%

Show abstract

Background: RNA sequencing (RNA-seq) is increasingly recognized as a complementary tool to DNA-based sequencing for improving the diagnostic yield in Mendelian disorders. However, how the diagnostic performance of RNA-seq varies across molecularly and phenotypically distinct patient subgroups remains poorly defined. This study aimed to evaluate and compare the diagnostic utility of RNA-seq across three stratified groups of patients with non-diagnostic exome sequencing. Methods: We performed RNA-seq on whole blood samples from 90 patients with suspected Mendelian disease in whom clinical exome or whole-exome sequencing had failed to establish a molecular diagnosis. Patients were prospectively stratified into three groups of 30: (i) patients with a candidate variant of uncertain significance (VUS) with predicted splicing impact (Group 1), (ii) patients with a specific clinical pre-diagnosis but no identified pathogenic variant (Group 2), and (iii) patients without a specific pre-diagnosis or candidate variant (Group 3). Aberrant splicing, gene expression outliers, and allele-specific expression were analyzed using multiple bioinformatic tools and compared against a GTEx-derived control cohort. Results: RNA-seq contributed to a molecular diagnosis in 29 of 88 evaluable patients (32.9%). Diagnostic yield differed substantially across groups: 82.8% (24/29) in Group 1, 6.9% (2/29) in Group 2, and 10% (3/30) in Group 3. In Group 1, RNA-seq enabled reclassification of candidate VUS through direct demonstration of aberrant splicing events. In Group 2, RNA-seq identified a somatic mosaic ACTB variant missed by exome sequencing and reclassified a previously deprioritized APPL1 VUS. In Group 3, a deep intronic pseudoexon-activating variant in IGBP1 was identified in two siblings with severe microcephaly, providing evidence for a candidate X-linked microcephaly gene, and a pathogenic RNU4-2 variant was detected in a patient with ReNU syndrome, a non-protein-coding gene not captured by standard exome sequencing. Conclusions: RNA-seq has the highest diagnostic utility when applied to evaluate candidate splice variants identified by prior DNA testing but also provides independent diagnostic value in patients without candidate variants. The systematic comparison across stratified patient groups supports the integration of RNA-seq into clinical genomic workflows and highlights the need for standardized analytic frameworks.

5

Differential causative effects of germline pathogenic variants in MUTYH and PALB2 in a patient with colorectal polyposis and breast cancer

Camacho Valenzuela, J.; Pelletier, D.; Polak, P.; Fu, L.; Hamel, N.; Domecq, C.; Ahmed, A.; Robles-Espinoza, C. D.; Foulkes, W. D.

2026-05-25 genetic and genomic medicine 10.64898/2026.05.15.26352890 medRxiv

Top 0.1%

6.3%

Show abstract

Purpose Patients carrying Germline Pathogenic Variants (GPVs) in multiple cancer susceptibility genes (CSGs) can be described within the context of Multi-locus Inherited Neoplasia Allele Syndrome (MINAS). The role of each GPV is typically interpreted based on clinical phenotypes. Here, we used tumor sequencing, particularly mutational signatures, to investigate the contribution of GPVs in MUTYH and PALB2 to colorectal polyposis and breast cancer in a single patient at a molecular level. Methods We analyzed tumor sequencing data, including mutational signatures and genomic scars, of a breast tumor and a colorectal polyp from a patient with biallelic GPVs in MUTYH and a heterozygous GPV in PALB2. Results The colorectal polyp showed a dominant contribution of MUTYH-associated Base Excision Repair deficiency (BERd) mutational signatures, with no evidence of Homologous Recombination Repair Deficiency (HRD). In contrast, the breast tumor showed both MUTYH-driven BERd and HRD-associated signatures, including SBS3, ID6 and an elevated HRD score, despite the absence of a detectable second hit in PALB2. These findings suggest a differential contribution from the CSGs, with MUTYH contributing to both lesions and PALB2 contributing specifically to the breast tumor. The observed pattern does not align with the additive or synergistic models described in MINAS. Conclusions Our study provides evidence that mutational signatures can elucidate the contribution of multiple CSGs to tumorigenesis within a single patient. These findings extend current interpretations of MINAS beyond additive or synergistic phenotypes, which may help to better understand tumor etiology, with potential clinical implications, including eligibility for targeted therapies.

6

Pharmacogenetic Characterization of Cytochrome P450 Genes involved in Psychotropic Medication Metabolism in a Cohort of Patients with Prader-Willi Syndrome

Moreno-Armengol, A.; Pareja, R.; Hernandez-Lazaro, A.; Capel, L.; Corripio, R.; Caixas, A.; Baena, N.

2026-05-18 pharmacology and therapeutics 10.64898/2026.05.09.26352521 medRxiv

Top 0.1%

6.3%

Show abstract

Prader-Willi syndrome (PWS) is a rare multisystemic disorder characterized by obesity, endocrine dysfunctions, and psychiatric comorbidities, which imply frequent use of psychotropic medications. They account for atypical responses to standard dosages of psychiatric drugs. Pharmacogenetics could be part of the reason for this situation, potentially offering a valuable tool for individualized treatment. This study analyzed allelic and phenotypic frequency distributions of five of the main cytochrome P450 enzymes (CYP2D6, CYP2B6, CYP2C19, CYP2C9, CYP3A4) involved in psychiatric drug metabolism in 47 patients with genetically confirmed diagnosis of PWS and compared them to reference frequencies in the general European population. Allelic frequency comparisons between the European reference population and the overall PWS cohort revealed a significant global difference for CYP2B6, with CYP2C19 and CYP2D6 showing trends toward significance. Although no global allelic differences remained significant after false discovery rate correction, post-hoc analyses consistently identified an enrichment of reduced- or non-functional alleles CYP2B619 and CYP2D610 in patients with PWS. Predicted metabolizer phenotype analyses showed a significant shift toward intermediate metabolizers of CYP3A4 in the PWS cohort, with corresponding depletion of normal metabolizers. Subgroup analyses indicated that allelic differences were more pronounced in maternal uniparental disomy and non-deletion subtypes, particularly for CYP2B6, although no significant differences were observed between PWS genetic subtypes. Overall, results imply potential differences in metabolizing activity in PWS patients, and subsequent implications in drug efficacy and tolerability. These results support the idea that pharmacogenetic testing may improve therapeutic decision-making in PWS for psychiatric treatment. Larger studies are needed to confirm these preliminary results.

7

Documented clinical genetic testing among carriers of hereditary breast and ovarian cancer variants: Ancestry and socioeconomic disparities in the All of Us research program

Yerukala Sathipati, S.; Scott, H.

2026-06-10 oncology 10.64898/2026.06.09.26355262 medRxiv

Top 0.1%

5.0%

Show abstract

Importance: Hereditary breast and ovarian cancer (HBOC) variant carriers benefit from risk-reducing interventions, but only if identified. The extent to which carriers are clinically recognized, and whether recognition is equitable across diverse populations, is poorly characterized in a single large U.S. cohort. Objective: To estimate P/LP HBOC carrier prevalence across genetic ancestry groups, quantify documented clinical genetic testing among carriers, and evaluate ancestry and socioeconomic disparities in testing. Design, Setting, and Participants: Cross-sectional analysis of the All of Us Research Program Controlled Tier (Curated Data Repository v8/C2024Q3R9), comprising participants with short-read whole genome sequencing and linked electronic health record (EHR) and survey data. Carriers were ascertained from research genomic data independent of clinical testing. Exposures: Genetically inferred ancestry (African [AFR], Admixed American [AMR], East Asian [EAS], European [EUR], Middle Eastern [MID], South Asian [SAS]); self-reported household income and educational attainment. Main Outcomes and Measures: (1) Carrier prevalence with Wilson 95% CIs; (2) documented clinical genetic testing (procedure codes) among carriers; (3) adjusted odds of documented testing among women, by ancestry, before and after socioeconomic adjustment, using multivariable logistic regression. Results: Among 414,830 participants, P/LP HBOC carrier prevalence was 1.42% (95% CI, 1.38-1.45) overall and similar across ancestry groups (AFR 1.24%, AMR 1.32%, EAS 1.19%, EUR 1.52%, MID 1.68%, SAS 1.33%; overlapping CIs). Among 250,071 women in the testing analysis, documented clinical genetic testing was rare: only 74 of 5,878 carriers overall (1.3%) and 59 of 3,572 European-ancestry carriers (1.7%) had a documented test, with counts below reportable thresholds in all other ancestry groups. African-ancestry women had lower adjusted odds of documented testing than European-ancestry women (Model 1 adjusted odds ratio [aOR], 0.32; 95% CI, 0.27-0.39), an association that attenuated but persisted after adjustment for income and education (Model 2 aOR, 0.48; 95% CI, 0.40-0.58; P < 0.001); Admixed American women also had reduced adjusted odds (aOR, 0.71; 95% CI, 0.61-0.84). Lower income and lower education were independently and dose-dependently associated with lower testing odds (income <$25,000 aOR, 0.46; high-school education aOR, 0.54). Conclusions and Relevance: High-risk HBOC variant carriers are present across all ancestry groups at similar frequencies, yet documented clinical genetic testing was disparate in the different ancestry groups. African-ancestry women experience a testing gap that is not fully explained by socioeconomic position, implicating structural barriers in access and referral. Population-level strategies that decouple carrier identification from current referral pathways may be required to close this gap.

8

Rare neurological and neurodevelopmental variants in ALS link to onset, survival and family history

O'Donoghue, C.; Kacar, E.; Gomes, T.; Costello, E.; Pender, N.; Peelo, C.; Ryan, M.; Heverin, M.; Byrne, S.; Bede, P.; Hardiman, O.; McLaughlin, R. L.; Byrne, R. P.

2026-06-10 genetic and genomic medicine 10.64898/2026.06.09.26354977 medRxiv

Top 0.1%

4.0%

Show abstract

Background: Neurological, neuropsychiatric, and neurodevelopmental disorders cluster in ALS families, sharing a common genetic architecture with ALS. Pathogenic variants in genes associated with other neurological, neurodevelopmental, or neuropsychiatric disorders may also co-occur in ALS and modify phenotype. We have sought to determine the prevalence and clinical pattern of likely-pathogenic/pathogenic (LP/P) non-ALS neurological, neurodevelopmental, and neuropsychiatric variants, alone and in combination with ALS-gene variants, in two large ALS cohorts. Methods: Whole-genome sequencing (WGS) of 469 Irish and 774 Answer ALS people with ALS (pwALS) was analysed for ClinVar LP/P variants associated with other neurological (n = 15541), neurodevelopmental (n = 9761), and neuropsychiatric (n = 321) phenotypes. Inheritance patterns for associated genes (autosomal recessive/autosomal dominant) along with the associated phenotype were validated using OMIM. Standardised clinical data included family history, site and age of onset, El Escorial category, survival, motor decline, and cognitive and behavioural assessments. Known ALS-gene variants and C9orf72 repeat expansion status were included for each cohort. Results: Non-ALS neurological variants were identified in 47/469 (10.0%) Irish and 69/774 (8.9%) Answer ALS participants, most frequently in hereditary spastic paraplegia-associated genes (3.2% Irish; 2.8% Answer ALS). Irish neurological variant carriers showed higher frequency of respiratory onset (10.6% vs 1.2%, Fisher's exact p = 0.002, {Phi} = 0.20) and fewer premorbid behavioural symptoms (0.92 +/- 0.56 vs 3.08 +/- 0.97, Cohen's d = -0.40). Neurodevelopmental variants occurred in 12/469 (2.6%) Irish and 20/774 (2.6%) Answer ALS participants. In the Irish cohort, neurodevelopmental variant carriers had significantly shorter survival in Cox proportional hazards model (log-rank p = 0.005), corresponding to a more than two-fold increased hazard of death (HR = 2.25, 95% CI 1.26-4.00), and had significantly increased familial burden of neuropsychiatric disorders among first- and second-degree relatives (negative binomial IRR for carriers = 2.41, 95% CI: 1.12-5.18, p = 0.025). Across combined cohorts, 18 individuals (Irish n = 8; Answer ALS n = 10) carried [≥]2 LP/P variants spanning ALS and non-ALS genes. Conclusion: Rare LP/P variants in genes associated with other neurological and neurodevelopmental disorders occur in up to 12% of pwALS across two independent cohorts. Carriers show distinct phenotypes, shorter survival, and characteristic family history patterns. These findings suggest that extended pleiotropic and oligogenic architectures may contribute to ALS heterogeneity.

9

Phenotype-Specific Recalibration of MAVE Data Enables Repurposing of BAP1 Functional Assays for Kury-Isidor Syndrome

Gupta, P.; Balton, E. V.; Tejura, M.; Kumar, R. D.; Snyder, M. W.; Stone, J.; Villani, R. M.; Peter, B. H.; Sirisak, C.; Ian, G. A.; Martha, H.-P.; Danny, M. E.; Jane, R.; Elisabeth, R. A.; Andrew, S. H.; Mark, W.; Undiagnosed Diseases Network (UDN), ; Kathleen, L. A.; Matthew, B. D.; Melissa, M. J.; Gail, J. P.; Katrina, D. M.; Elizabeth, B. E.; Fowler, D. M.; Starita, L. M.; McEwen, A. E.; Stergachis, A. B.

2026-05-21 genetic and genomic medicine 10.64898/2026.05.15.26352805 medRxiv

Top 0.1%

3.9%

Show abstract

Purpose Multiplexed assays of variant effect (MAVEs) are transforming clinical variant interpretation. However, many genes are associated with more than one disease, making it unclear whether functional data generated in one disease context may be directly applicable to another. For example, germline BAP1 missense variants are associated with both BAP1 tumor predisposition syndrome (BAP1-TPDS) and Kury-Isidor syndrome (KURIS), a rare neurodevelopmental disorder. Here, we demonstrate how phenotype-specific calibration of BAP1 MAVE data enables disease-specific variant classification. Methods Saturation genome editing (SGE) data for BAP1 were recalibrated using either BAP1-TPDS- or KURIS-associated missense variants as pathogenic controls. Functional evidence strength was quantified using the Odds of Pathogenicity (OddsPath) framework and mapped to ACMG/AMP PS3/BS3 criteria. Recalibrated functional evidence was integrated with standard clinical criteria for variant classification. A workshop was developed to teach phenotype-specific MAVE recalibration to clinicians and variant curators and evaluated for educational impact. Results Phenotype-specific recalibration using BAP1-TPDS and KURIS controls yielded OddsPath values consistent with PS3_Strong evidence in both contexts. Application of KURIS-specific recalibration enabled the diagnosis of KURIS in an individual with a previously uncertain BAP1 missense variant. The educational workshop enabled quantitatively improved understanding in applying functional evidence. Conclusion Phenotype-specific recalibration enables appropriately calibrated reuse of MAVE datasets across distinct disease contexts, increasing the clinical utility of MAVE datasets and the interpretability of variants in pleiotropic genes. This framework expands the diagnostic utility of existing functional datasets without requiring new experimental assays.

10

Detecting genomic regions enriched for reciprocal recombination in autism spectrum disorder

Mahoney, C. F.; Salter-Townshend, M.; Fitzpatrick, D. J.; Shields, D. C.

2026-05-27 genetics 10.64898/2026.05.26.727863 medRxiv

Top 0.1%

3.9%

Show abstract

Meiotic recombination is an important means of increasing genetic diversity by generating novel haplotypes in a population. Recombination separates linked loci extremely slowly in some regions, therefore genetic variants in high linkage disequilibrium may become co-adapted. Reciprocal recombination that separates co-adapted variants may generate a deleterious de novo haplotype that contributes to disease. We developed statistical methods to detect genomic regions of recombination excess in two different family-based study designs. We identified recombination in the Simons Simplex Collection in 273 simplex families with one child with autism spectrum disorder (ASD) and at least two unaffected children, in which recombinations can be mapped to the proband and contrasted with the recombination counts in unaffected siblings; and in 1,802 families with two children, where the number of recombinations identified can be contrasted with the expectation from a reference recombination map. Both strategies revealed a tail of low p-values for loci of interest that contrasted with the rest of the distribution. Permutation and bootstrap tests did not identify genome-wide primary findings in either cohort, but the most significant three-child cohort locus of recombination excess (between cadherin genes CDH4 and CDH26) replicated in the two-child cohort (p=0.01). While this replication strategy was not defined a priori, five of the most recombination enriched bins identified candidate ASD genes (p=0.02; WWOX, ADAMTS16, INSR, ADARB2, and HS6ST1). Since the six identified loci were not identified as regions of high de novo copy number variation in the study cohort and no CNVs were detected in any of the recombinant probands in the identified regions, they represent candidates for reciprocal recombinations generating unfavourable haplotypes for these genes. This study highlights a previously unidentified source of clinical genetic variability contributing to the molecular aetiology of ASD. AUTHOR SUMMARYAutism spectrum disorder (ASD) is a constellation of neurodevelopmental disabilities characterised by deficits in social communication and repetitive patterns of behaviour. While ASD is highly heritable, its genetic basis is complex and poorly understood. While some highly penetrant types of genetic variation have been identified, most people with ASD carry a large number of variants that each contribute a small amount to their overall phenotype. In addition to mutations in individual genes, changes in the configuration of genes along a chromosome may contribute to ASD. Here, we describe a method for identifying regions where such new configurations have occurred through recombination and attempt to find regions where such changes are more common in autistic children than in their non-autistic siblings. We explore recombination as a source of genetic variation contributing to autism, which has potential to inform clinicians in providing services to autistic people and their families.

11

Large-scale association study identifies lung cancer susceptibility copy number variants and their potential functional role in genetic instability

Xiao, F.; Qin, F.; Luo, X.; Slewitzke, S. E.; Fernandes, G. F.; Johansson, M.; Xiao, X.; Zaridze, D.; Bojesen, S. E.; Shete, S.; Albanes, D.; Aldrich, M. C.; Tardon, A.; Fernandez-Tardon, G.; Le Marchand, L.; Rennert, G.; Bickeböeller, H.; Wichmann, H.-E.; Risch, A.; Muley, T.; Rosenberger, A.; Field, J. K.; Davies, M.; Woll, P.; Kiemeney, L. A.; Haugen, A.; Zienolddiny, S.; Lam, S.; Johansson, M.; Grankvist, K.; Schabath, M. B.; Andrew, A.; Lazarus, P.; Arnold, S. M.; Zhu, D.; Brenner, H.; Neuhouser, M. L.; Hung, R. J.; Christiani, D. C.; McKay, J.; Cai, G.; Xia, J.; Amos, C. I.

2026-05-15 genetic and genomic medicine 10.64898/2026.05.11.26352741 medRxiv

Top 0.1%

3.7%

Show abstract

Background: Genome-wide association studies (GWAS) have identified numerous lung cancer susceptibility loci based on single nucleotide polymorphisms (SNPs), yet a substantial proportion of heritability remains unexplained. We therefore evaluated germline copy number variants (CNVs) as an underexplored source of genetic susceptibility and potential contributors to genomic instability in lung cancer. Methods: We conducted a genome-wide analysis of germline CNVs using 19,342 cases and 15,917 controls from the Transdisciplinary Research in Cancer of the Lung (TRICL) consortium, with replication in two independent cohorts. High-confidence CNVs were identified by integrating two CNV callers including PennCNV and modSaRa2. Association analyses were performed using both gene-based and CNV region-based approaches. Polygenic risk scores (PRS) were constructed from top loci, and functional validation was conducted using siRNA-mediated knockdown in lung fibroblast cells. Results: We identified CNVs in four genomic regions (1p36.22, 2q31.2, 6p21.32, and 19q13.32) significantly associated with lung cancer risk. Two loci (1p36.22 and 2q31.2) were consistently supported across both analytical strategies. A CNV-based PRS constructed from key genes (CLCN6, NFE2L2, OPA3, and PSMB8) was significantly associated with lung cancer risk and replicated across independent datasets. Functional assays demonstrated that knockdown of NFE2L2 and OPA3 increased endogenous DNA damage, supporting a role in genomic stability. Conclusions: Germline CNVs contribute to lung cancer susceptibility and may influence carcinogenesis through mechanisms related to genomic instability. Impact: These findings expand the genetic architecture of lung cancer and highlight CNVs as potential biomarkers for improving risk stratification and informing precision prevention strategies.

12

Comprehensive analysis of de novo variants across 2,497 orofacial cleft trios reveals novel genetic drivers of disease

Kurtas, N. E.; Sanchis-Juan, A.; Shin, E.; Curtis, S. W.; Robinson, K. R.; Lee, A. S.; Alade, A. A.; Zhao, X.; Fu, J.; Diaz Perez, K. K.; Gowans, J. J. L.; Eshete, M. A.; Adeyemo, W. L.; Buxo, C. J.; Padilla, C. D.; Poletta, F. A.; Carreno Torres, A.; Wehby, G. L.; Hecht, J. T.; Moreno Uribe, L. M.; Mukhopadhyay, N.; Shaffer, J. R.; Weinberg, S. M.; Murray, J. C.; Beaty, T. H.; Butali, A.; Talkowski, M.; Marazita, M. L.; Leslie-Clarkson, E. J.; Brand, H.

2026-05-24 genetic and genomic medicine 10.64898/2026.05.21.26352934 medRxiv

Top 0.2%

3.3%

Show abstract

Background Orofacial clefts (OFCs) and other palate abnormalities (PAs) are among the most common birth defects worldwide and are characterized by the abnormal formation of the lip and/or palate. Genetic studies have traditionally classified OFC cases as either syndromic, involving OFCs alongside other congenital anomalies, or nonsyndromic, which represent the majority of cases and occur in isolation. Emerging genomic evidence indicates that genes traditionally associated with syndromic forms of OFC can also harbor variants contributing to isolated cases, challenging the notion of a strict dichotomy between these categories and supporting their integration for gene discovery. Methods In this study, we applied multiple analytic approaches to characterize the genetic architecture of OFC and PAs by integrating genomic data from 2,497 trios with an OFC (n=2080) and PA (n=417) affected proband. We compared these findings across OFC subtypes and syndromic status with those from 5,515 control trios to identify enriched biological pathways and mechanisms and to prioritize candidate genes using variant burden testing. Results We observed a significant enrichment of de novo protein-truncating and damaging missense variants in cases compared to controls (OR = 2.17, p = 1.21x10-32), with particularly strong signals in biologically relevant gene sets involving OFC-associated, constrained, Mendelian disorder, and mouse candidate genes. Variant burden testing identified 39 OFC risk genes at FDR [≤] 0.05, which we then integrated with 593 established OFC genes to interrogate the functional underpinnings of OFC via network analysis. This analysis revealed 309 high-order interactor genes not previously associated with OFC. Notably, this OFC network clustered into ten distinct biological pathways, with nucleosome-associated genes showing significant enrichment among cases in our cohort (OR = 14.8, p = 8.1x10-4). In a final integrative step, we combined evidence across all analyses to nominate 231 candidate genes, 32 of which contained at least two deleterious de novo variants in our cohort. Conclusions These findings underscore the value of integrating diverse OFC and PA subtypes, syndromic status, and variant classes to refine the genetic architecture of these disorders, highlighting both phenotypic expansion of known disease genes and the emergence of novel gene-phenotype associations.

13

Biallelic CYB5A disruptions in 46,XY Disorder of Sex Development: Identification and Characterization of a Novel Deep Intronic Variant

Moradifard, S.; LE, T. N. U.; Ha, N. T.; Dung, V. C.; Thao, B. P.; Harley, V. R.

2026-05-12 genetic and genomic medicine 10.64898/2026.05.05.26352416 medRxiv

Top 0.2%

2.9%

Show abstract

BackgroundThe diagnostic yield for 46,XY disorders of sex development (DSD) remains limited. Whole-genome sequencing (WGS) improves detection of both coding and non-coding variants that may be missed by routine testing. Cytochrome b5, encoded by CYB5A, is an essential co-factor for CYP17A1-mediated 17,20-lyase activity. We report on WGS on a Vietnamese family with 46,XY DSD with two siblings presenting with female external genitalia. MethodsClinical assessment and hormone profiling were conducted. WGS was conducted on peripheral blood DNA, in two affected siblings followed by variant annotation and ACMG-based classification. A minigene RNA splicing assay in HEK293 cells was used to evaluate the functional impact of the CYB5A intronic variant. ResultsThe patients hormone profile showed low testosterone and estradiol. WGS identified compound-heterozygous CYB5A variants: a paternally inherited missense variant (p.Val34Glu, likely pathogenic) and a maternally inherited deep intronic deletion (c.129+862_129+863del) for which SpliceAI predicted aberrant splicing. Minigene assays confirmed that the intronic deletion creates cryptic splice sites, resulting in pseudoexon inclusion and a premature stop codon, consistent with nonsense-mediated decay. The intronic variant meets ACMG criteria for pathogenicity. ConclusionThis family expands the spectrum of CYB5A-related DSD and demonstrates that compound-heterozygous variants, including deep intronic defects, can lead to a disruption in 17,20-lyase activity. These findings highlight the importance of WGS and functional assays for identifying clinically relevant non-coding variants in DSD.

14

Early pregnancy metabolomics and risk of offspring heart defects: a matched case-control study

Nastou, K.; Ottosson, F. A.; Schmidt, A.; Corn, G.; Geller, F.; Grundvad Boelt, S.; MacSween, N.; Wohlfahrt, J.; Lund, M.; Melbye, M.; Ernst, M.; Feenstra, B.

2026-05-12 epidemiology 10.64898/2026.05.08.26352715 medRxiv

Top 0.2%

2.8%

Show abstract

Congenital heart defects (CHDs) are the most common congenital malformations and often arise from perturbations during early embryonic development. Maternal metabolic disturbances in early pregnancy may contribute to CHD risk, but evidence from early first-trimester metabolomics studies is limited. We conducted an untargeted metabolomics case-control study using early first-trimester maternal plasma samples (gestational weeks 4-10) from the Danish National Birth Cohort. Metabolite profiling was performed via liquid chromatography-tandem mass spectrometry (LC-MS/MS) on 160 matched CHD case-control pairs (320 total samples). Conditional logistic regression and interaction analysis were used to identify metabolites associated with CHD risk or specific cardiac phenotypes. A total of 1,471 metabolite features were measured with 69 metabolites being associated with CHD at nominal significance (p < 0.05). These included a desaturated analog of sphingosine-1-phosphate (S1P), isoleucylproline and an arginine related metabolite. However, after false discovery rate correction for multiple testing no metabolites remained significant. While these findings do not preclude that subtle metabolic variation may exist in early pregnancy among CHD cases, they also underscore the challenges of biomarker discovery in this context. This work highlights the potential of early-pregnancy metabolomics for CHD biomarker discovery, and points toward more targeted future studies with improved sample collection protocols, pre-specified pathway panels, and phenotype-homogeneous analyses to better capture the subtle metabolic variation that may underlie CHD risk.

15

Explainable machine learning reveals an RBP regulatory logic of exon skipping

Raghav, Y.; Paul, A.; Anderson, R.; Karthyk, S.; Iturralde, A.; Vyas, J.; Dy, J.; Jones, B. C.; Castaldi, P. J.; Platig, J.

2026-05-30 systems biology 10.64898/2026.05.29.728731 medRxiv

Top 0.2%

2.3%

Show abstract

RNA binding proteins (RBPs) regulate the life cycle of an mRNA, often through RBP-RNA interactions. This life cycle includes splicing, whereby the intronic sequence of a pre-mRNA is removed and the exons are joined together. However, the patterns of RBP binding that lead to different splicing outcomes are still incompletely understood. Here, we build machine learning models from RBP-RNA binding and knockdown RNA-seq data for over 168 RBPs in two cell lines (HepG2 and K562) to better understand the binding patterns that predict exon skipping, the predominant form of alternative splicing in humans. We show that models trained exclusively on RBP binding patterns are indeed predictive and that a more sophisticated machine learning model (XGBoost) outperforms simpler linear models. In addition, we are able to extract a biologically interpretable logic embedded in these models. We show that SHAP, a machine learning explainability technique, captures activating and repressive behavior of RBP binding that is position-specific. In addition, we find that SHAP values are predictive of changes in unseen splicing events and that SHAP interactions between pairs of RBPs are predictive of protein-protein interactions. Our results demonstrate that using machine learning with interpretability techniques can reveal a regulatory logic of RBP binding. By estimating the impact of an RBP binding site on a splicing event, the SHAP values also provide a directly testable scientific hypothesis. We anticipate that models designed around biological processes and focused on interpretability will yield actionable biological insights both in splicing and genomics generally.

16

Tumoral Switch in NUMB splicing changes essential transcription pathways and induces malignant properties in tumour cells

Garcia-Heredia, J. M.; Carnero, A.; Ortega-Campos, S.

2026-05-19 cancer biology 10.64898/2026.05.15.725391 medRxiv

Top 0.2%

2.2%

Show abstract

BackgroundRecent evidence suggests that cancer can exhibit splicing alterations that give rise to tumour-specific isoforms. One example is NUMB, which produces four isoforms (p72, p71, p66, and p65) through alternative splicing of exons 3 and 9. Traditionally considered a tumour suppressor, it also has been considered an oncogene. We propose that this duality is due to isoform-specific expression. ResultsUsing public databases, we identified a tumour-associated switch in NUMB isoform expression: p72/p71 are upregulated in tumours, whereas p66/p65 are more expressed in non-tumour tissues. These isoforms correlate differently with cellular processes. NUMBL, a NUMB homolog, behaves similarly to p65. We identified two transcriptional clusters: one characterized by high expression of p72/p71, and another by p66/p65/NUMBL. Each group was associated differently with the Notch, WNT/{beta}-catenin, Hedgehog, and Hippo signalling pathways, suggesting isoform-specific regulatory roles. Analysis of breast cancer cell lines (CCLE) led to a NUMB score based on isoform expression, which classified cell lines into biologically distinct groups. The p72/p71-enriched group showed distinct signatures, pathway activity, and drug sensitivity. Applying this score to TCGA-BRCA samples revealed a significant link between high NUMB-score and poor survival, confirmed by Kaplan-Meier analysis. ConclusionsNUMB emerges as a potential oncogenic contributor and biomarker in splicing-based personalised medicine, highlighting isoform-specific expression as a clinically relevant determinant of tumour behaviour, pathway activity, and therapeutic response.

17

Measuring the Meaning of Genomic Results: Harmonization of the Metric for Case-Level Results in the CSER2 Consortium

Powell, B. C.; Amendola, L. M.; Bonini, K. E.; Crosslin, D.; Desrosiers-Battu, L.; Hiatt, S. M.; Hindorff, L.; Kenny, E. E.; Mavura, Y.; Muenzen Ferar, K. D.; Risch, N.; Roman, T.; Slavotinek, A.; Van Ziffle, J.; Bowling, K. M.

2026-06-01 genetic and genomic medicine 10.64898/2026.05.28.26354388 medRxiv

Top 0.3%

1.9%

Show abstract

Yield of reported results from genetic testing provides a proximal measure of clinical usefulness. While ACMG/AMP guidelines provide representations of uncertainty for individual genetic variant classification, additional factors are considered when determining whether results explain a patient's presentation. To standardize cross-consortium analysis, a working group of the Clinical Sequencing Evidence-Generating Research (CSER2) consortium iteratively identified factors used when contextualizing variant-level results to case-level interpretation (i.e., interpretation of an individual's genetic data with respect to the indication for testing). Sites independently categorized results; complex cases were discussed collaboratively, leading to revision of classification categories. Our metric incorporates factors beyond classification of reported variants. Analogous to variant-level results, "Definitive Positive" and "Probable Positive" represent certainty that results may be clinically explanatory. The category "Inconclusive" applies when results may or may not fully explain the patient presentation, with subdivision into multiple (non-exclusive) subcategories. Cases falling outside all of the other categories are considered "Negative". The overall diagnostic yield by this metric and use of categories for inconclusive results varied by CSER project, in part paralleling study design differences. This case-level categorization provides a meaningful assessment of diagnostic yield, and for inconclusive cases identifies potentially resolvable factors for case resolution.

18

Calibrated high-throughput electrophysiology enables clinical interpretation of CACNA1G missense variants

Finol-Urdaneta, R. K.; Tan, C.-Y.; Maksemous, N.; Ma, J. G.; Lockhart, P.; Snell, P.; Malhotra, A.; Thompson, B. A.; Garg, G.; Goel, H.; Griffiths, L. R.; Adams, D. J.; Vandenberg, J. I.; Ng, C. A.

2026-05-18 neuroscience 10.64898/2026.05.10.724145 medRxiv

Top 0.3%

1.9%

Show abstract

ObjectiveAccurate classification of ion channel variants of uncertain significance (VUS) remains a persistent challenge in clinical genomics, limiting diagnostic resolution in neurological disorders. MethodsWe developed a calibrated electrophysiological framework to generate functional evidence for clinical interpretation of CACNA1G variants encoding the low-voltage-activated calcium channel Cav3.1. Functional metrics derived from automated patchclamp recordings were calibrated against benign/likely benign (B/LB) and pathogenic/likely pathogenic (P/LP) reference variants to enable conservative application of ACMG/AMP functional criteria within clinical variant interpretation workflows. ResultsCalibration using 25 B/LB and 16 P/LP CACNA1G variants showed that more than 80% of P/LP variants exhibited reduced current density (CD). Deactivation kinetics ({tau}Deact) provided complementary discriminatory information by identifying gating abnormalities in variants with preserved CD. Application of this dual-metric framework to five VUS identified in Australian patients revealed two variants (Cav3.1-R186Q and R1394Q) with abnormal functional profiles consistent with voltage-sensor perturbation, supporting reassessment as likely pathogenic under ACMG/AMP guidelines. The remaining VUS displayed functional properties overlapping the benign reference distribution. ConclusionThese findings establish a calibrated functional framework for generating electrophysiological evidence that supports clinical interpretation of CACNA1G missense variants under ACMG/AMP guidelines. When applied as external functional evidence, this approach improves resolution of CACNA1G-associated VUS while maintaining conservative standards for variant classification.

19

Optical genome mapping identifies source-associated structural variant differences across early-passage human iPSCs

Namvar, L.; Sedov, K.; Yang, M. J.; Hermosillo, R.; Zafar, F.; Schuele, B.

2026-05-31 genomics 10.64898/2026.05.29.728843 medRxiv

Top 0.3%

1.9%

Show abstract

BackgroundInduced pluripotent stem cells (iPSCs) are an important model for studying human diseases in vitro. However, previous studies have shown that iPSC reprogramming and extended cell culture can introduce genomic structural variants (SVs). Technologies like karyotyping, CNV microarrays, and whole-genome sequencing have limitations in resolution, sensitivity, or the ability to detect large and complex structural variants compared to optical genome mapping (OGM). OGM is a genome-wide structural variant detection method that analyzes fluorescently labeled ultra-high-molecular-weight DNA molecules to identify copy-number and balanced rearrangements. At sufficient coverage, OGM can detect SVs at approximately [≥]2 kbp and identify mosaic events supported by molecule-level evidence, offering higher resolution than conventional karyotyping or SNP-array-based QC. Here, we compared iPSC clones derived from peripheral blood mononuclear cells (PBMCs) and fibroblasts (FBCs) to determine whether starting somatic cell source is associated with differences in structural variant burden and SV-type profiles after nuclear reprogramming into iPSCs. ResultsWe analyzed 73 low-passage iPSC clones generated from 25 parental lines using OGM. Compared with PBMC-iPSCs, FBC-iPSCs showed higher SV burden with the enrichment of duplications [≥]100 kbp, more frequent overlap with protein-coding genes, fragile sites, and recurrent chromosomal hotspot regions. In contrast, PBMC-iPSCs showed fewer SVs overall, and a higher proportion of clones without detectable clone-specific SVs. ConclusionsOGM provides a high-resolution approach for post-reprogramming genomic quality control by detecting clone-specific structural variants at approximately [≥]2 kbp, including events below the resolution of conventional cytogenetic and SNP-array-based assays. In these early passage iPSCs, SVs overlapped protein-coding genes, fragile sites, and recurrent culture-associated chromosomal regions, underscoring the need for clone-level genomic assessment before downstream applications. FBC-derived iPSCs showed a higher SV burden, including more frequent and larger duplications, whereas PBMC-derived iPSCs more often lacked detectable clone-specific SVs. These findings suggest that PBMC-iPSCs and FBC-iPSCs can differ in post-reprogramming SV profiles and support the use of OGM as a QC strategy during iPSC generation and selection.

20

A Foundational Exome Resource for Jordan: Dual Ancestry Admixture and Population-Specific Variants to Improve Clinical Variant Interpretation

Froukh, T.

2026-05-27 genetic and genomic medicine 10.64898/2026.05.23.26353895 medRxiv

Top 0.3%

1.8%

Show abstract

Currently, the genetic architecture of Middle Eastern populations is underrepresented in global genomic databases. This gap increases the rate of Variants of Uncertain Significance (VUSs) and clinical misinterpretations of genomic data especially in Middle Eastern populations. Whole exome sequencing was conducted on 90 healthy individuals from Jordan and the data were analysed using Principal Component Analysis (PCA) and multi-computational filtering. PCA revealed a double ancestry (EUR-AFR) admixture rather than a triple admixture (EUR-AFR-AMR). More than 3,500 populations-specific variants (PSVs) were identified, of which 72% were singletons. Additionally, 19 variants were significantly enriched compared to the maximum allele frequencies in public global databases (Fisher's exact test with Benjamini-Hochberg false discovery rate correction, p-value < 0.05). Consequently, the results suggest the reclassification of variants of Uncertain Significance (VUS) which reside in the ECE2 gene to likely benign and the variants of Conflicting Classification of Pathogenicity in the genes IL1RN and THPO to benign based on the significant allele frequency (AF=0.0389, p-value < 0.05). Furthermore, a pathogenic ClinVar variant was identified in a healthy individual, warranting careful interpretation. The findings underscore the importance of identifying PSVs in order to minimize or even prevent clinical misdiagnosis and highlight the unique genetic signature in Jordan. The study serves as a foundational resource for precision medicine in the region.